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Abstract 

Supertree methods construct trees on a set of taxa (species) combining many smaller trees on the overlapping subsets of 
the entire set of taxa. A 'quartet' is an unrooted tree over 4 taxa, hence the quartet-based supertree methods combine many 
4-taxon unrooted trees into a single and coherent tree over the complete set of taxa. Quartet-based phylogeny 
reconstruction methods have been receiving considerable attentions in the recent years. An accurate and efficient quartet- 
based method might be competitive with the current best phylogenetic tree reconstruction methods (such as maximum 
likelihood or Bayesian IVICMC analyses), without being as computationally intensive. In this paper, we present a novel and 
highly accurate quartet-based phylogenetic tree reconstruction method. We performed an extensive experimental study to 
evaluate the accuracy and scalability of our approach on both simulated and biological datasets. 
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Introduction 

A phylogenetic tree of a group of species (taxa) describes the 
evolutionary relationship among the species. The study of 
phylogeny not only helps to identify the historical relationships 
among a group of organisms, but also supports some other 
biological research such as drug and vaccine design, protein 
structure prediction, mxiltiple sequence alignment and so on [1]. 
The ultimate goal of this research community is to infer the Tree of 
Life, the phylogeny of all living organisms on earth, provided that 
it exists. 

Phylogenetic tree reconstruction by analyzing the molecular 
sequences of different species can be regarded as the sequence- 
based reconstruction of the phylogeny. Sequence-based phyloge- 
netic methods are basically of three types [1]: (a) distance-based 
methods, such as Neighbor Joining (NJ) [2], which has very fast 
practical performance; (b) heuristics for either Maximum-Likeli- 
hood (ML) [3] or Maximum-Parsimony (MP) [4], which are two 
NP hard optimization problems; and (c) the Bayesian Markov 
Chain Monte Carlo (MCMC) method, which, instead of a single 
tree, produces a probability distribution of the trees or aspects of 
the evolutionary history. Sequence-based methods are generally 
highly accurate. However, these methods are computationally 
intensive. As a result, these can only be applied on small to 
moderate sized datasets if we want to provide results having an 
acceptable level of accuracy within a moderate amount of time. 
For larger datasets (few hundreds of taxa (species)), these methods 
may need several weeks or months to provide results with an 
acceptable level of accuracy [I]. As the amount of molecular data 
is accumulating exponentially with the continuous advancement in 
sequencing technologies, scientists are facing new computational 
challenges to analyze these enormous amount of data. Therefore, 



we are forced to rely on supertree methods, where smaller trees on 
overlapping groups of species are combined together to get a single 
larger tree. Supertree-based tree construction is a two-phase 
method: in the first phase, many small trees on overlapping subsets 
of taxa are constructed using a sequence-based method; and in the 
next phase the small trees are summarized into a complete tree 
over the fuU set of taxa. 

Supertree methods are considered to be the likely solutions 
towards assembling the Tree of Life. Hence, these methods have 
drawn potential research interest in recent years. Supertree 
methods have two major motivations: frrstiy, it gives us the 
opportunity to achieve increased scalability and secondly, it is 
more suitable to combine the phylogenetic analyses on different 
types of data (e.g., molecular, morphological and gene-order data) 
or species groups. The careful design of supertree methods may 
allow us to work on very large (several hundreds taxa) datasets 
more accurately and easily. The most widely used supertree 
method is called the Matrix Representation with Parsimony 
(MRP) [5,6]. MRP encodes all the small trees into a matrix using 
the characters 0, 1 and ?. Then it uses Maximum-Parsimony (MP) 
[4] to get a tree from the data matrix. MRP is considered to be the 
most reliable supertree method to date. But since it uses an NP 
hard problem to analyze the data matrix, it is not efTicient for large 
datasets. 

Quartet amalgamation methods are supertree methods when 
each of the the small trees to be combined is a quatret, i.e., an 
unrooted tree having 4 taxa. Quartet is the most basic piece of 
unrooted phylogenetic information. Quartet-based phylogenetic 
inference has drawn significant attention from the research 
community, and numerous quartet-based methods have been 
developed over the last two decades. In this paper, we present a 
novel and highly accurate quartet amalgamation technique. We 
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conduct an extensive experimental study that demonstrates the 
superiority of our algorithm over QMC [7-9], which is known as 
the best quartet amalgamation method to date. 

With the intTcasing abundance of molecular data, constructing 
species trees from multilocus data has become the focus of 
attention. But combining data on multiple loci is not a trivial task 
due to the gene tree discordance [10-12]. The task is even more 
complicated with the striking recognition that the most probable 
rooted gene tree topology- (under a coalescent model [12-18]) need 
not match the species tree topology [19,20]. These are termed as 
Anomalous gene trees (AGTs). AGTs occur because not all tree 
topologies are equiprobable under the coalescent model 
[18,21,22]. In fact, rooted AGTs exist for any species tree with 
5 or more taxa. It has also been shown that rooted AGTs cannot 
occur with a three taxa and a symmetric four taxa species tree 
[19]. AGTs have also been studied for unrooted gene trees, and it 
has been observed that for a species tree with four taxa, the most 
probable rooted gene tree topologies have the same unrooted 
topology as the species tree [2.S]. This observation indicates that 
the most frequently occurring unrooted quartet is a consistent 
estimate of the unrooted species tree [23] . Thus, quartet based 
phylogeny can offer a sensible and statistically consistent approach 
to combine multilocus data, despite gene tree incongruence and 
AGTs [24,25]. Thus a highly accurate quartet amalgamation 
approach will help to design species tree estimation methods that 
are not susceptible to the gene tree discordance and AGTs. 
Notably, as has already been mentioned above, the other 
important advantage of quartet-based methods is that efficient 
design of such inference algorithm can be scalable to very large 
datasets (several hundreds or thousands of taxa). 

Previous Works 

Quartet-based phylogenetic tree reconstruction has been 
receiving extensive attention in the literature for more than two 
decades. Different approaches have been proposed and improved 
time to time. Among these, the most prominent approaches are, 
quartet puzzling (QP), quartet joining (QJ) and quartet max-cut 
(QMC). 

Quartet puzzling (QP) [26] infers the phylogeny of n sequences 
using a weighting mechanism. First, it computes the maximum- 
likelihood values for the three topologies on ever^' 4 taxa and uses 
these values to compute the corresponding probabilities. Using 
these probabilities as weights, the puzzling step constructs a 
collection of trees over n taxa. Finally it returns a consensus tree 
over w-taxa. TREE-PUZZLE [27] is a widely used program 
package that implements QP. In 1997, Strimmer et al. [28] 
extended the original QP algorithm by proposing three different 
weighting schemes, namely, continuous, binary and discrete. Later 
in 2001, Ranwez and Gascuel [29] proposed weight optimization 
(WO), an algorithm which is also based on weighted 4-trees 
inferred by using the maximum likelihood approach. WO uses the 
continuous weighting scheme defined in [28] and it searches for a 
tree on n taxa such that the sum of the weights of the 4-trees 
induced by this tree is maximal [29] . Unlike QP, WO constructs a 
single tree over n taxa; hence no consensus step is required. 
Though the speed and accuracy of WO are better than that of QP, 
its accuracy is lower than that of the methods based on 
evolutionary distances or maximum likelihood. Quartet joining 
(QJ) [30] was introduced in 2007 to overcome the limitations of 
Q_P and WO in outperforming the distance based methods. QJ 
provides the theoretical guarantee to generate the accurate tree if a 
complete set of consistent quartets is present. On average QJ 
outperforms QI* and its performance is very close to the 



performance of NJ [2], but QJ outperforms NJ on quartet sets 
with low quartet consistency rate [30]. 

In 2008, Snir et al. [7] proposed a new quartet-based method, 
short quartet puzzling (SQI'). The experimental studies in [7] 
shows that SQI' provides more accurate trees than QI', NJ and 
MP. It differs from the previous techniques in that it does not 
require all three topologies of the quartets on every 4 taxa. It is 
able to construct the output tree from a subset of all possible 
quartets as input. This is a two-phase technique: the first phase 
uses the randomized technique for selecting input quartets from all 
possible 4-trees (estimated using ML), and the second phase uses 
Quartet Max Cut (QMC) [7,8] technique for combining quartets 
into a single tree. The experimental study conducted by Swenson 
et al. [31] concludes that Q_MC performs better than the other 
supertree methods and MRP for smaller (100-taxon and 500- 
taxon) and high scaffold (i.e., high scaffold density) datasets. But 
MRP outperforms QMC and other supertree methods on larger 
and low scaffold (i.e., low scaffold density) datasets [31]. 
Subsequendy, Snir and Rao presented a fast and scalable 
implementation of QIVIC [9], where they reported the improve- 
ment of Q_MC over MRP in terms of accuracy and running time. 
Although MRP is the mostiy used supertree method in practice, 
the studies of [9,31] suggest that QIVIC is so far the best quartet- 
based supertree method. 

In this paper, we present a new quartet-based phylogeny 
reconstruction algorithm, Quartet FM (QI^M), which uses a 
bipartition technique inspired from the famous Fiduccia and 
Mattheyses (FM) algorithm for bipartitioning a hyper graph 
minimizing the cut size [32]. As will be reported later, QFM is 
highly accurate and scalable to large datasets (upto several 
hundreds of taxa). We demonstrate the accuracy of QFM by 
analyzing its performance on both simulated and biological 
datasets. We have compared our method on simulated datasets 
with Quartet MaxCut (QMC) [7-9], and showed the superiority 
of our method over QMC in terms of the accuracy of the 
estimated trees. To show the potential of our method, we also 
analyzed a real biological dataset containing 25 species from 4 
genera of birds (Amytomis, Stipiturus, Malurus and Clytomias). 
We have demonstrated a qualitative analysis of our results on real 
dataset based on the results of some rigorous previous studies on 
the same dataset. 

Problem Definition 

We address the problem of Maximum Quartet Consistency 
(MQC), which is a natural optimization problem. This problem 
takes a quartet set Q as the input and fmds a phylogenetic tree T 
such that the maximum number of quartets in Q become 
"consistent" with T (or T "satisfies" the maximum number of 
quartets). Now we formally define the problem. 

Problem 1 Maximum Quartet Consistency. 

Input: A multiset of quartets Q on a taxa set P. 

Output: A phylogenetic tree T on P such that T satisfies the 
maximum number of quartets of Q. 

The Maximum Quartet Consistency (MQC) problem is an NP- 
hard optimization problem [33]. Both exact and heuristic 
approaches are available for the MQC probknn in the lit(;rature 
[34] . The running time of an exact algorithm grows exponentially 
with the increase of number of taxa, since the number of possible 
trees grows more than exponentially with the number of taxa [35]. 
So for larger datasets we have to resort to the heuristic solutions. 
The focus of this work is on heuristic solutions for the MQC 
problem as we aim to build the phylogenetic tree for several 
hundreds of taxa. 
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Results 

We have conducted an extensive experimental study on both 
simulated and biological datasets. We have evaluated the accuracy 
of the trees estimated by QFM and compared the results to that of 
QMC [9]. QMC is the most accurate quartet amalgamation 
method developed to date, and was shown to be more accurate 
than MRP [9]. We have reported RF (Robinson Foulds) [36] rates 
of the estimated trees. RF rate is the mosdy used error metric, 
which is the ratio of the sum of the number of false positive and 
false negative edges to a factor 2m — 6, where n is the number of 
taxa [1]. The false positive (FP) and false negative (FN) edges are 
respectively, the edges which are absent in the true tree but present 
in the estimated tree, and the edges which are present in the true 
tree but absent in the estimated tree. 

Simulated Datasets 

To investigate the performance of our method on various model 
conditions, we have generated quartet sets, taken uniformly at 
random from model trees, by varying the number of taxa (n), the 
number of quartets [q) and the percentage of consistent quartets (c) 
with respect to the model tree (90% consistency level means that 
10% quartets are flipped to disagree with the model tree). We have 
generated model species trees with « = 25, 50, 100, 200, 300, 400 
and 500 taxa. To generate the model trees and the input quartet 
sets, we have used the tool developed and used in [9]. The tool 
takes as input the number of taxa (n), number of quartets {q) and 
the consistency level (c), and returns the quartet sets accordingly. 
For « = 25, 50, 100, we have generated and * quartets. 

We have not generated more quartets because quartets have 
been empirically shown to be enough to construct very accurate 
phylogenetic trees [9]. Although w''^ is a small number, we have 
chosen this size to test the performance of both methods on a 
comparatively smaller number of quartets as well. For 200, 300, 
400 and 500-taxon model trees, we have generated datasets with 
= and q = Tp'. For each size [q), we have varied the 
percentage of consistent quartets (c) by making it 70%, 80%, 
90%, 95% and 100%. Thus in total we have generated 
45 + 40 = 85 model conditions. To test the statistical robustness, 
we have generated 20 replicates of data for each of these model 
conditions. For each model condition, we report the average RF 
rate over the 20 replicates of data. We also report the standard 
error, given by 5/ y^iV) where S is the standard deviation and 
is the number of datapoints (which is 20 in our experiments). The 
standard errors are reported in Table SI and Table S2 in File SI. 
We have used WUcoxon signed-rank test with a = 0.05 to test the 
statistical significance of the differences between QFM and QMC. 
The results of the Wilcoxon T-test (^-values) are reported in Table 
S3 in File SI. 

Analyses on the Simulated Datasets 

We now present the results on the simulated datasets mentioned 
above. In each case, we have compared the average RF rate for 
the trees estimated by QFM and QMC. The results for c = 70%, 
c = 80%, c = 90% and c = 95% are summarized in Table 1. 
Figure 1 shows the bar charts comparing the values presented in 
Table 1. The results in Table 1 is presented in batches for 
different \ alucs of M as follows. For w = 25, 50, 100, we have three 
rows, one each for q = n^-^, q = n^ and q = n^'^. For « = 200, 300, 
400, 500 we have two rows, one each for q = n^-^ and q = rfi. The 
topmost row of each batch of Table 1 shows the results when 
q = n^'^ (from left to right, the consistency levels reported are 70%, 
80%, 90%, 95%, respectively). For tiiis (? = «'-^) case, both QMC 
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and QFM have performed poorly which implies that n''^ quartets 

are quite insufficient for accurate phylogeny reconstruction. This 
can be attributed to the fact that is a very small number 



compared to 3 x 




(i.e., the possible number of quartets). 



However, as the consistency level (c) increases, QFM starts to 
produce better trees than QMC; and very often the improvements 
of QFM over QMC are statistically significant (see Table S3 in File 
SI). This is very promising in the sense that, QFM can construct 
more accurate trees than Q_MC even with very small number of 
quartets. The second row of each batch of Table 1 shows the 
results with rP' quartets. With quartets, both QFM and QMC 
begin to produce better trees than that of n''^ quartets. However, 
quadratic number of quartets is stiU not sufficient for reconstruct- 
ing an accurate tree (which confirms the observation of [9]). But as 
before, QFM is statistically significantiy better than QMC in most 
of the cases. The bottom most row of the first three batches in 
Table 1 shows the results with quartets. In this case, both 
Q_FM and QMC reconstruct highly accurate species trees (error 
rates are close to zero) even with 70% consistent quartets. 

From these results, it is clear that QFM either matches the 
accuracy of QMC or (in most cases) produces better trees than 
QMC. QFM outperforms QMC in 55 cases out of the 68 model 
conditions shown in Table 1, and in 33 cases the differences are 
statistically significant (see Table S3 in File SI). QMC is better 
than QFM on only 5 cases, but the differences between the two 
methods are not statistically significant. For the rest 8 cases, both 
QFM and QMC have equal error rates (these are mostly the 
datasets with q = Tp''^ quartets where both of them have been able 
to reconstruct the true trees). 

We have also evaluated QFM and QMC on the noise-free 
model conditions, meaning that all the rjuartets are accurate 
(c= 100%). Table 2 demonstrates the results under the parameters 
(n,q) with c= 100%. Of the 17 model conditions analyzed, QFM 
has been found to be better than QMC on 10 cases, and the 
improvements are statistically significant in 5 cases (see Table S3 in 
File SI). QMC is better than QFM in two cases but the differences 
are not statistically significant. In 5 cases QFM and QMC have 
identical accuracy. 

Computational Issues 

We have evaluated the running time and memory' usage of 
QFM and QMC. On smaller datasets, both QFM and QMC run 
in few seconds. For example, on 25 taxa, QFM took between 3 
seconds to 20 seconds (depending on the number of quartets), and 
QMC took less than 2 seconds. Both of these methods are very fast 
on the datasets with up to 200 taxa and ^\ ith rjuartets: QFM 
took few minutes while QMC completed in few seconds. However, 
QFM is much slower than QMC on the larger datasets. For 
example, QFM took 1 1 hours for the largest datasets of our 
experiment with 500 taxa and 250000 quartets, while QMC took 
only one minute. We believe that this difference is due to the naive 
implementation of our algorithm. QMC has been implemented in 
a very efficient code, and it scales well on larger datasets. We are 
currentiy working on improving our implementation using 
advanced data structures. We are also parallelizing our divide 
and conquer based approach. 

We have also measured the memory usage by these methods. 
Both QFM and QMC are memory efficient and use only few 
megabytes of memory. For example, the peak memory usages by 
QMC and Q_FM on the datasets witii 500 taxa and 250000 
quartets are 10 MB and 21 MB, respectively. 
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number of taxa number of taxa number of taxa 

Figure 1. Average RF rates of QFWl and QMC on the simulated datasets. We show average RF rates (over 20 replicates of data) for each 
model condition. We varied the number of taxa (n), number of quartets (q) and the percentage of consistency level (c). For a particular value of ^ and 
c, the number of taxa is varied along the X-axis, the average RF rate is shown along the Y-axis, and the error bars represent the standard errors. From 
left to right: the number of quartets are and From top to bottom: 70%, 80%, 90% and 95% of the input quartets are consistent with the 

model species tree. We did not run our method on quartets when the number of taxa is more than 100, since these are computationally intensive 
and QFIVI could not be run within a reasonable time limit. IVloreover, these model conditions are less revealing and interesting since both QIVIC and 
QFIVl can reconstruct the true species trees with n^-^ quartets. 
doi:1 0.1 371 /journal. pone.01 04008.g001 



Analyses on the Avian Biological Dataset (Australo- 
Papuan Fairy-wrens) 

We have further evaluated the performance of QFM on a real 
avian biological dataset consisting of 25 birds. Since Avian 
phylogeny is considered to be hard to reconstruct, we have chosen 
this dataset as a good representative of real datasets. This dataset 
consists of 18 gene trees on 25 species representing 4 genera of 
birds {Amytornis, Stipiturus, Malurus and Clytomias) from 
Australo-Papuan avian family Maluridae, obtained from Tree- 
BASE [37]. This dataset has originally been used to study the 
efficacy of species tree methods at the family level in birds, using 
the Australo-Papuan Fairy-wrens (Passeriformes: Maluridae) clade 
[38] . Due to the presence of substantial amount of incomplete 
lineage sorting (ILS) [38], analyzing this family of birds is quite 
challenging. 

We have decomposed every gene tree into its induced quartets 
which is called embedded quartets [9,39]. Then, we have taken the 
union of all these quartets (multiple copies of a quartet have been 
retained). In this way we get 227,700 quartets. We have used these 
quartets to estimate a species tree using our method (QFM). We 
also ran QMC on this datasets. Both QFM and QMC returned 
the same tree. The tree is shown in Figure 2. 



Since we do not know the true trees for biological datasets, we 
have compared the result obtained from QFM with biological 
beliefs and other rigorous analyses. The tree returned by QFM 
(which is identical to the tree estimated by QMC) is quite 
interesting and consistent with the previous findings as discussed 
below. 

• QFM has been able to correctly identify the clusters associated 
with the four genera of birds. Also, it has placed the group of 
Amytornis birds as the sister to the rest of the family, and the group 
of Stipiturus birds as the sister to Malurus and Clytomias birds. 
These evolutionary relationships maintained by QFM are 
supported by the findings of the previous studies [38,40,41]. 

'Amytornis: Using aUozyme analysis, Christidis [41] has shown 
that A. harbatus is the earliest diverged lineage in the Amytornis 
genus. Same results have been obtained by a DNA sequencing 
study in [42]. The sequence-based analysis of Lee et al. [38] also 
have confirmed this. Our analyses with QFM also have found the 
same pattern. Lee et al. [38] also have shown that /I. housei should 
be within the textilis complex, which is confirmed by our QFM 
tree. 

• Stipiturus: Evolutionary relationships within the Stipiturus 
genus have been well studied [38,40,43]. Our study is consistent 
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Table 2. Comparison of QFM and QMC under the noise-free model conditions. 





n 


? 


Average RF rate 




c=100% 






QFM 


QMC 


25 


125 


0.444 


0.515 


25 


625 


0.056 


0.052 


25 


8208 


0 


0 


50 


354 


0.661 


0.666 


50 


2500 


0.140 


0.140 


50 


57164 


0 


0 


100 


1000 


0.777 


0.797 


100 


10000 


0.269 


0.274 


100 


398108 


0 


0 


200 


2829 


0.848 


0.881 


200 


40000 


0.424 


0.424 


300 


5197 


0.887 


0.907 


300 


90000 


0.506 


0.499 


400 


8000 


0.897 


0.930 


400 


160000 


0.5S4 


0.555 


500 


11181 


0.903 


0.937 


500 


250000 


0.590 


0.606 



Average RF rates of QFM and QMC over the 20 replicates of data under the noise-free model conditions (c = 100%). We varied the number of taxa in) and the number of 
quartets iq). Results are shown in bold face where QFM is better than QMC. 
doi:l 0.1 371 /journal.pone.Ol a4008.ta02 



with the previous findings: S. mallee and S. luficeps are closer to 
each other than they are to S. malachurus. 

• Clytomyias and Malurus: C. insignis was placed to Stipiturus 
species by [40]. However, in a more recent extensive multi-locus 
study, Lee et al. [38] argued that C. insignis is closer to M. grayi. 
Our study has also confirmed this fact. Also our study has 
confirmed their [38] findings that M. alhoscapulatus is closer to M. 
melanocephalus than to M. leiicopterus . 

Lee et al. [38] showed that ILS is liijely a general feature of the 
genetic history of these avian species. Since quartets are not prone 
to anomaly zone [19,23], quartet based analyses to resolve the 
avian history is of high importance. Interestingly, both QMC and 
QFM resolved the evolutionary history of these 1 8 birds similarly. 
Therefore, we believe that this tree should be considered as a 
reasonable hypothesis about the evolutionary history of this family 
of birds. 

Discussion 

In this work we have presented a novel and highly accurate 
quartet amalgamation technique, which we refer to as QFM. We 
have demonstrated the superiority of our method over QMC, 
which is known to be the best quartet amalgamation method to 
date. 

QFM is a new promising divide and conquer supertree method 
having an algorithmic appeal. We have conducted an extensive 
experimental study comparing QFM against QMC under 
different model conditions by varying difiFerent parameters. For 
almost all model conditions considered, QFM performs at least 
equal but in most cases better than QMC. In line with the 
experimental results shown in [9], we have found that quadratic 
sampling of quartets is not sufficient for accurate supertree 



construction. However, with quartets, both QFM and QMC 
can reconstruct very accurate trees indicating that it is possible to 
reconstruct an accurate supertree from large number of quartets, 
even with high amount of noise in the input data. QFM has also 
been tested on real biological datasets and has been shown to 
perform pretty well. The tree estimated by QFM has maintained 
the important evolutionary relationships despite the presence of 
incomplete lineage sorting. This is particularly interesting because 
this suggests that we can use quartet-based technique to develop 
species tree estimation method (from multi-locus data), which is 
less susceptible to gene tree incongruence due to ILS. 

Species tree estimation is frequendy based on phylogenomic 
approaches that use multiple genes from throughout the genome. 
However, combining data on multiple genes is not a trivial task. 
Genes evolve through biological processes that include deep 
coalescence (also known as incomplete lineage sorting (ILS)), 
duplication and loss, horizontal gene transfer etc. As a result the 
individual gene histories can differ from each other [10]. Species 
tree estimation in the presence of ILS is a challenging task. 
Moreover, anomalous gene trees (AGTs) make this task even more 
complicated [19,20]. It has been proven that AGTs cannot occur 
in quartets and thus the most probable quartets induced by the 
true gene trees represent the true species trees for the 
corresponding four species [19,23], Therefore, quartets can be 
used to design statistically consistent methods (methods that have 
the statistical guarantee to construct the true species tree given 
sufficiently large number of true gene trees) for constructing the 
species tree from gene trees (which evolve with ILS) as follows. 
First, we compute the quartets induced by the gene trees. For 
every four species, there are three possible quartets. Given 
sufficiently large number of true gene trees, the most probable 
quartets (the most frequently occurring quartets) on every four 
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Gerygone_olivacea 

Amytomis_barbatus 



Amytomis_dorotheae 

Amytomis_striatus 

Amytomis_merrotsyi 

Amytornisjextilis 

Amytornis_housei 

Amytomis_goyderi 

Amytomis_ballarae 
Amytornis_purnelli 



Stipiturus_malachunjs 

Stipitums_njficeps 
Stipiturus_mallee 
Malurus_grayi 
Clytomiasjnsignis 

Malurus_cyancicephalus 



Malurusjamberti 

Malunjs_pulcherrimus 

Malurus_elegans 
Malurus_amabilis 

IVIalurus_splendens 

Malurus_coFonatus 
^^^^^ Malurusjeucoptems 

1^^^^^^— Malums_melanocephalus 
Malums_alboscapulatus 

Figure 2. The 25 species avian phytogeny, representing 4 genera of birds from Maluridae family, estimated by QFM using the 
227,700 embedded quartets in 18 gene trees. The evolutionary relationships maintained by this tree are supported by the findings of the 
previous studies [38,40,41,43]. 
doi:10.1371/journal.pone.0104008.g002 



species represent the true species trees for those four species. Thus 
combining the most probable quartets to get a single and coherent 
species tree is an statistically consistent approach for species tree 
estimation. In this context, we can formalize the maximum 
weighted quartet satisfiability problem as follows. 

• Input: A set S of weighted quartets. 

• Output: The species tree T such that T maximizes the 
summation of the weights of the satisfied quartets in S. 

We can define the weight of a quartet q as the proportion of the 
gene trees that induce q. We can also incorporate the branch 
lengths in defining the weights. One major advantage of QFM is 
that it can readily be adapted to take a set of weighted quartets as 
input without making any change in its algorithmic constructs. 
Therefore, we think QFM is an important contribution to the 
phylogenomic analyses, in particular for estimating species trees 
from a set of gene trees where gene trees can be discordant from 
each other due to ILS. 

Another advantage of QFM lies in its flexibility in choosing the 
partition score function (see "Partition Score" section). QFM can 
be customized to take different scoring functions (i.e., s — v, sjv, 
etc.) without making any change in the algorithmic construct. We 
have observed that QFM may not give the same result for different 



scoring functions for the same dataset. So for different datasets, we 
may obtain better results by adapting difierent suitable scoring 
functions. Thus QFM provides us with the flexibility to change the 
scoring function as needed. In future we shall try to make our 
algorithm self-adaptable to the appropriate scoring function by 
analyzing different characteristics of the input datasets. Notably, as 
has already been discussed above, one shortcoming of the current 
implementation of QFM is that it is not as fast as QMC. 

Materials and Methods 

In this section we present our heuristic algorithm, namely, the 
Quartet FM (QFM) algorithm. Our algorithm employs a quartet 
based supertree reconstruction technique that involves a bipartition 
method inspired by the Fiduccia Mattheyses (FM) bipartition 
technique [32]. 

Basics 

A quartet {{A,B),(C,D)) is consistent with a tree T if in T, there 
is an edge (or path in general) separating A and B from C and D. 
For any four taxa, only one quartet (out of 3 possible quartets) will 
be consistent with a tree T . In Figure 3 among the three quartets. 
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T 

Figure 3. Quartet consistency with a tree T. Among the three quartets, only q\ = {{A, C), (B,D)) is consistent with T because T has an internal 

edge that separates taxa A and C from taxa B and D in T. 

doi:10.1371/journal.pone.0104008.g003 



quartet q\ is consistent with tree T as there exists an edge in T 
such that it separates A and C from B and D. Other two quartets 
are inconsistent with T as no such edge exists in T. 

A bipartition of an unrooted tree T is formed by taking any 
edge in T, and writing down the two sets of taxa that would be 
formed by deleting that edge. Let 7" be a tree over the taxa set P. 
Now, if we take an internal edge e oi T and delete e, then we get 
two subtrees, namely, Ta and Ti,. Let Pa and Pt be the sets of taxa 
of T„ and T), respectively. We shall denote such bipartition by (P^,, 
Ph). Thus an internal edge in T corresponds to a bipartition of P. 

A quartet q = {iA,B),{C,D)) is satisfied with respect to a 
bipartition (Pa,Pb) if taxa A and B reside in one part and taxa C 
and D reside in the other. A satisfied quartet is consistent with T . 
The quartet q is said to be violated with respect to a bipartition 
{Pa,Pb) when taxa A and C (or A and D) reside in one part and 
taxa B and D (or B and C) reside in the other part. On the other 
hand, q is said to be deferred with respect to a bipartition (Pa,Ph) if 
any three of its four taxa reside in one part and the fourth one 
resides in the other. 

A tree T over a taxa set P is said to be a star, if T has only one 
internal node and there is an edge from the internal node incident 
to each taxon teP. We shall refer to such a tree as a depth one tree. 

Divide and conquer approach 

We foUow a divide and conquer approach similar to QMC [7- 
9] . Let, 2 be a set of quartets over a set of taxa, P. We aim to 
construct a tree T on P, satisfying the largest number of input 
quartets possible. The divide and conquer approach recursively 
creates bipartition of the taxa set, where each bipartition 
corresponds to an internal edge in the tree under construction. 
QMC uses a heuristic bipartition technique which is based on 



finding a maximum cut (MaxCut) in a graph over the taxa set, 
where the edges represent the input quartets [9]. On the other 
hand, our algorithm uses a heuristic bipartition algorithm inspired 
by the famous Fiduccia and Mattheyses (FM) [32] bipartition 
algorithm. 

Divide. At each recursive step, we partition the taxa set P into 
two sets Pa and Ph- We shall describe the bipartitioning algorithm 
in "Method of Bipartition" section. After the algorithm partitions 
the taxa set, it augments both parts {Pa and Pj) with a unique 
dummy (artificial) taxon. This taxon will play a role while 
returning from the recursion. After the addition of the dummy 
taxon to the sets Pa and P/,, we subdivide the quartet set Q into 
two sets, Qa and Qb- A quartet set Qi takes those quartets 
({a,b),(c,d)) from Q such that either all four taxa a, b, c and d or 
any three thereof belong to P, (here ie{a,b}). In other words, 
satisfied or violated quartets with respect to the partition (Pa,Ph) 
are not considered to be included in either Qa or Q/,. Moreover, in 
every deferred quartet, where three taxa are in the same part, the 
other taxon is renamed by the name of the dummy taxon, and the 
quartet continues to the next step. Thus we get, two (Qi,Pi) pairs: 
{Qa,Pa) and (Qb,Ph}- We then recurse on both pairs (Qa,Pa) and 
(Qb-,Pb) if Qi is non-empty and |P,| > 3. If either Qi is empty or 
|P/| < 3, we return a depth one tree over the taxa set P,. 

Conquer. On returning from the recursion, at each step, we 
have two trees, Ta (corresponding to (Qa,Pa)) and 7/, (corre- 
sponding to (Qb,Ph))- These two trees are rerooted at the dummy 
taxon. Then the dummy taxon is removed from each tree and the 
two roots are joined by an internal edge. 

Figure 4 describes the high level divide and conquer algorithm. 
Let Q be the input quartet set and P be the corresponding taxa 
set. Assume that Q = {((I,2),(3,4)), ((I,3),(2,4)), ((2,3),(4,5)), 
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Quartets 






ql: 


((1,2),(3,4)) 


q4: 


((1,2),(5,6)) 


q2: 


((1,3),{2,4)) 


q5: 


((3,4),(5,6)) 


q3: 


((2,3),(4,5)) 


q6: 


((1,3),(5,6)) 



Set ofTaxa 
{1,2, 3,4, 5, 6} 




Figure 4. Divide and conquer approach. Divide: At eacin step, the input set of taxa of this step is partitioned into two sets and an unique dummy 
taxon is added to both sets. The input quartet set is then partitioned into two sets according to the bipartition of the set of taxa. So we get two (taxa 
set, quartet set) pairs, which are input to the successive divide steps. If at any step, the quartet set gets empty or the size of the taxa set becomes less 
than or equal to 3, a depth one tree over the taxa set is returned. Conquer: At each step, there are two trees corresponding to the divide calls initiated 
at this step. These two trees are joined on the dummy taxon introduced at this step during divide. For example, the leftmost two depth one trees, 
when returned to its caller, are joined on the dummy taxon Y . 
doi:1 0.1 371 /journal.pone.01 04008.g004 



((1,2),(5,6)), ((3,4),(5,6)), ((1,3),(5,6))}, and hence P= {1,2,3, 
4,5,6}. First, P is partitioned into two sets, Pa = {1,2,3, X} and 
Pa = {4,5,6,X} by using the bipartition technique described in 
"Method of Bipartition" section. Here, X is the dummy taxon. 
The bipartition (P„,Pf,) satisfies quartets 93 : ((2,3),(4,5)), 
qA : ((1,2),(5,6)) and : ((1,3), (5,6)) from Q. So these quartets 
will not be considered in the next level. Qa takes qi : ((1,2), (3,4)) 
and qi ■ ((1,3),(2,4)) as three of the taxa oiqi and q2 reside in Pa. 
We replace the taxon which does not belong to Pa with the 
dummy taxon X. Hence we get Qa = {((1,2),(3,X)),((1,3),(2,X))}. 
Similarly we get Qi, = {((X,4),(5,6))}. Next we recurse on (Qa,Pa) 
and (Qb,Pb), and Pa and Pt are partitioned further into (Pa^,Pat) 
and (Pi,^,Pi,i^), respectively. The partition {Pa^,Pat) satisfies 
((1,2),(3,X)) and violates ((1,3),(2,X)) in Qa and (Ph^Pbt) satisfies 
the only quartet in Q/,. So the quartet sets for the next level are 



empty and hence no more recursion is required. We return a depth 
one tree for each of the taxa sets Pa^,, Pat, Ph„ and P/,j. The returned 
trees are merged by removing the dummy taxon of that level and 
joining the branches of the dummy taxa. In Figure 4, the upper half 
shows the divide steps. The depth one trees are returned when no 
more recursion is required. The lower haJf of Figure 4 shows how 
the trees are returned and merged as the recursion unfolds (conquer 
step). Thus we get the final merged tree (((1,2),3),(4,(5,6))) (shown 
at the bottom of Figure 4) satisfying 5 quartets in total. The satisfied 
quartets are q\, q^, q^, qs and qe- 

Method of Bipartition 

The most crucial part of our algorithm is the bipartition (divide 
step) technique. Here, we differ from QMC [7-9] and adopt a new 
bipartition technique inspired by the famous Fiduccia and 



PLOS ONE I www.plosone.org 



9 



August 2014 | Volume 9 | Issue 8 | el 04008 



Phylogenetic Tree Estimation from Quartets 












Gain(l)=-4 


4 \ 


Gain(2)=-2 


5 


Gain(4)=0 o 


6 / 


Gain(5)=-4 




Gain(6)=-3 




locked fragment 



Gain(6)=2 <i 



' Pa, 










( 2 \ 
(3) 


\ 5 1 
1 6 1 


Gain(l)=-2 
Gain(2)=-2 o 
Gain(5)=-3 
Gaiii(6)=-3 


v®y 


/©\ 

@ 
I ^ / 

Vv 






f \ 

Pa, Pb, 












Gain(l) 
Gain(5) 
Gain(6) 


=-1 <1 

=-2 
=-3 













Gain(5)=-1 <i 
Gain(6):=-2 



Figure 5. An example iteration of the Bipartition Algorithm WIFWl. The locl<ed taxa are sinown in circles. At each step, the taxon which has the 
maximum gain and will be transferred from its current partition to the other is indicated by a left arrow. _P/,„) is the initial bipartition of this 
iteration. Initially all taxa are free (i.e, not locked). The gain is computed for each free taxon of this step and the taxon (which is 3 here) with maximum 
gain is transferred from its own partition to the other partition. Thus we get partition (/"„,, /";,,), where 3 is a locked taxon. In this way, only one taxon 
is locked at a step and once a taxon is locked, it remains locked throughout the iteration. An iteration completes when all taxa get locked. Here, all 
taxa get locked at (Pa^, Ph^- 
doi:1 0.1 371 /journal.pone.01 04008.g005 



Mattheyses (FM) algorithm for bipartitioning a hyper graph 
minimizing the cut size [32]. In divide and conquer based 
phylogenetic tree construction, the bipartition of the taxa set 
corresponds to an internal edge of the tree under construction. An 
internal edge, in turn, plays a role to make quartets to be satisfied 
or violated against the bipartition. So we adopt a different 
bipartition technique from that used in QMC, with an objective to 
get better results. 

Our bipartition algorithm takes a pair of taxa set and a quartet 
set (P, Q) as input. It partitions P into two sets, namely. Pa and P\, 
with an objective that (Pa, Ph) satisfies the maximum number of 
quartets from Q. The algorithm starts with an initial partition and 
iteratively searches for a better partition. We wiU use a heuristic 
search to find the best partition. Before we describe the steps of the 
algorithm, we describe the algorithmic components. 

Partition Score. We assess the quality of a partition by assigning 
dLpartition mm. We use a scoring function, Score{P„,Pi,,Q), such that 
the higher score will indicate a better partition. This flinction checks 
each qeQ against the partition (Pa,Pb) and determines whether q is 
satisfied, violated or deferred. We define the score function in terms 
of the number of satisfied and violated quartets. Let s and v denote 
the number of satisfied and violated quartets. Then, two natural 
ways of defining the score function are: 1) taking the difference 
between the number of satisfied and violated quartets [s — v), and 2) 
taking the ratio of the number of satisfied and violated quartets 



{s/\ + v). As num In this paper, we used j — v as the score function. 
We can also use some other complicated score functions defined in 
terms of the number of satisfied, violated and deferred quartets (i.e., 
Score(Pa,Pb,Q)=f(s,v,d), where d denotes the number of 
deferred quartets). In our preliminary experimental study, we have 
explored different score functions and observed that s — v gives 
better performance in most of the cases. Notably, although in some 
cases other functions (e.g., s/l + v, s — v + s/d + s/l + v) achieve 
better results than s — v (results are not shown in this paper), none of 
them is consistentiy better than s — v. 

Gain Measure. Let (Pa,Ph) be a partition of set of taxa P. 
Let teP be a taxon and witiiout loss of generality we assume that 
tePa. Let {Pa, Pb ) be the partition after moving the taxa t from Pa to 
Pi, . That means, = Pa~ t, and Pj =Pb^t. Then we define the 
gain of the transfer of the taxon t with respect to (Pa,Pb), denoted by 
Gain(t,(Pa,Pb)), as Score(P;„ P,,', Q)- Score{P„,Ph,Q). 

Singleton Bipartition. A bipartition (Pa,Ph) of P is singleton 
if l^fll =1 or \Pb\ =1. In our bipartition algorithm, we keep a 
check for the singleton bipartition. We do not allow our bipartition 
algorithm to return a singleton bipartition to avoid the risk of an 
infinite loop. 

Algorithm. Now we describe the bipartition algorithm which 
we call MFM (Modified FM) Bipartition Algorithm. Let, (P, Q) be 
the input to the bipartition algorithm, where Pbeasetof taxa and 
2 be a set of quartets over the taxa set P. We start with an initial 
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The log table corresponding to the iteration shown in Figure 5. Here k represents the step number. The input partition to step k is {Pat_,, Ph^,)- The second column 
shows the taxon that has the maximum gain at the corresponding step, and the third column shows the corresponding maximum gain. The fourth column shows the 
cumulative gain of the gains listed in the third column. We observe that the cumulative gain gets maximum (2) after moving taxon 3 in step 1. So all the subsequent 
moves of taxa are rolled back. The resultant partition of this iteration is ) = ({1,2, 3}, {4. 5, 6}), which is the initial partition for the next iteration of the iteration in 

Figure 5. 

doi:l 0.1 371 /journal.pone.Ol 04008.t003 



bipartition (Pa„,Pbi,) of P. The initial bipartitioning is done in four 
steps. 

• Step 1: We count the frequency of each distinct quartet in Q. 

• Step 2: We then sort Q by the frequency count of tlie quartets 
in a decreasing order. 

• Step 3: Suppose after sorting Q = q\,q2, ■ ■ ■ ,qm, where 
m= \Q\. Now we consider the quartets one by one in the sorted 
order. Initially both P^o and Pi,„ are empty. 

Let q = {{t\,t2),{h^U)) be a quartet in Q. If none of the 4 taxa 
belongs to either P„^ or P^^, then we insert t\ and t2 in Pc,„ and ?3 
and ?4 in Pi,„ . Otherwise, if any of the 4 taxa exists in either P^^ or 
Phf, we take the following actions to insert a taxon which doest not 
exist in P„j or P/,^. We maintain an insertion order. We consider 
ti, t2, f3 and /4 respectively. 

- To insert t\ , we look for the partition of (if ?2 exists in any 
part) and insert ti into that partition. But if ?2 does not exist in 
either of the partitions, then we look for the partition of either or 
?4 (either of these two must exist in Pa„ or P;,„) and insert ?i into 
the other partition. 

- To insert ?2, we look for the partition of fi and insert ?2 into 
that partition. 

- To insert , we look for the partition of Z4 (if ?4 exists in any 
part) and inset ^3 into that partition. But if does not exist in 
either of the partitions, then we look for the partition of either ti or 
t2 and insert ?3 into the other partition. 

- To insert t^, we look for the partition of ?3 and insert tn into 
that partition. 

Table 4. Gain Summary. 



• Step 4: When we insert a taxon / to any part, we remove it 
from P. After considering each qeQ and inserting taxa accord- 
ingly, if P remains non-empty, we insert the remaining taxa to 
either part randomly. 

Obtaining (Pa„,Pio), we search for a better partition iteratively. 
At each iteration, we perform a series of transfers of taxa from one 
partition set to the other to maximize the number of satisfied 
quartets. At the beginning of an iteration, we set the status of aU 
the taxa as free. Then, for each free taxon teP, we calculate 
Gain(t,(Pa„,Pb„)), and find the taxon ti with the maximum gain. 
There can be more than one taxa with the maximum gain where we 
need to break the tie. We will discuss this issue later. Next we 
transfer ti and set the status of this taxon as locked in the new 
partition that indicates that it will not be considered to be 
transferred again in this current iteration. This transfer creates the 
first intermediate bipartition (Pa, ,P|i,, ). The algorithm then finds the 
next free taxon ?2 with the maximum gain with respect to (Pa, ,P/,, ), 
and transfer and lock that taxon to create another intermediate 
bipartition (P^j jPfe^). Thus we transfer all the free taxon one by one. 
Let Q be the input quartet set and P be the corresponding taxa set. 
Assume that Q = {((1,2),(3,4)), ((1,2),(5,6)), ((1,3),(2,4)), 
((3,4),(5,6)), ((2,3),(4,5)), ((1,3),(5,6))} (same as used in Figure 4). 
Hence, P= {1,2,3,4,5,6}. Following the steps of the initial biparti- 
tion, we get the initial bipartition Pa„ = {1,2} and P/,„ ={3,4,5,6}. 
Figure 5 shows the first iteration of the bipartition algorithm for this 
particular example. 
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The log table corresponding to the next iteration of the iteration shown in Figure 5. Here k represents the step number. The input partition to step k is Phk-i)- 
The second column shows the taxon that has the maximum gain at the corresponding step, and the third column shows the corresponding maximum gain. We observe 
that the cumulative gain gets maximum {0} at step 1. So we rollback all the subsequent moves including the move at step 1 and return the initial partition 
({1,2, 3}. {4, 5, 6}) of this iteration as the resultant bipartition of the bipartition algorithm. No more iteration is needed as the maximum cumulative gain of the current 
iteration is not greater than zero. 
doi:1 0.1 371/journal.pone.01 04008.t004 
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Suppose that the taxa are locked in the following order: 
{ti,t2, ■ ■ ■ Jn}- That is, ti has been locked first, then t2, tj, and so 
on. Let, the gain values of the corresponding partitions are: 

Gain {h ,{PaQ ,Pbo)\ ■ ■ ■ 'Gain {t„XPa„_i ,Pb„_ i ))• 



Now we define the cumulative gain up to the kih transfer as 

k 

CGain{k) = ^ Gain(ti,{Pa._ ^ ,Pbi_ i ))• 

;=1 

The maximum cumulative gain, MCGain(t\,t2, ■ ■ ■ ,tn) is 
defined as 

MCGain{{t\,t2, . . . ,t„})=maxi < <„CGain{i). 

In each iteration, the algorithm finds the current ordering 
{{ti, t2,--- Jn}) of the transfers and saves this order in a log table 
along with the cumulative gains (see Table 3 for example). Let be 
the taxonin the log table corresponding to MCGain{{ti,t2, ■ ■ ■ ,tn})- 
This means that we obtain the maximum cumulative gain after 
moving the mth taxon (with respect to the order stored in the log 
table). Then we rollback the transfers of the taxa (;„,_|_i, . . . ,t„) that 
were moved after t^- Let the resultant partition after these rollbacks is 
{Pa,Pb)- This partition will be the initial partition for the next 
iteration. In this way, the algorithm continues as long as the 
maximum cumulative gain is greater than zero and returns the 
resultant bipartition. Table 3 lists the order of locking, corresponding 
gain and cumulative gain with respect to the iteration illustrated in 
Figure 5. From Table 3 we note that we get the maximum 
cumulative gain, 2, after moving taxon 3. Here, we also get the 
maximum value of cumulative gain after moving taxon 4. We break 
the tie arbitrarily. We consider the taxon for which we get the 
maximum cumulative gain for the first time. For this example, we get 
the maximum cumulative gain of 2 at taxon 3 for the first time. So we 
rollback all the subsequent moves. The resultant partition after this 
rollback is ({1,2,3}, {4,5,6}) (partition iPa,,Pb,) in Figure 5). 
Similarly, Table 4 lists the ordering of locking, corresponding gain 
and cumulative gain with respect to the iteration which follows the 
iteration illustrated in Figure 5. From Table 4 we get that the 
maximum cumulative gain is 0. So the moves are roUed back and we 
get the final resultant partition ({1, 2,3}, {4,5, 6}). 

As we have mentioned earlier, we do not aUow any transfer of 
taxa that results into a singleton bipartition. Therefore, we need to 
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add some additional conditions. Also, there could be more than 
one free taxa with the maximum gain, where we need to decide 
which one to transfer. We consider the following cases to address 
these issues. Let, M he a set of free taxa with the maximum gain. 

• Case 1: \M\ > 1 and at least one corresponding bipartition is 
not singleton. That means, there exists teM such that transfer of t 
does not result into a singleton bipartition. Let M* he the set 
of taxa, that can be safely transferred without resulting in a 
singleton bipartition. Note that, \M*\ > 1. If = 1, we transfer 
the taxa teM* . Otherwise, we have more than one taxa in M* . In 
that case, we pick the taxon teM* , for which the corresponding 
bipartition (after transferring t) satisfies maximum number of 
quartets (note that every taxa in M* has the same gain, but the 
corresponding bipartitions do not necessarily satisfy the same 
number of quartets). In the case of a tie, we choose one taxon at 
random. 

• Case 2: \M\ > 1 and transfer of each teM results in a singleton 
bipartition. In this case, we consider the set of taxa with the second 
highest maximum gain. Let M' be the set of free taxa with the 
second highest maximum gain. We then recursively check 'Case 1' 
and 'Case 2' on M'. If we cannot find a taxon that can be 
transferred without resulting into a singleton bipartition, we make 
the status of all the free taxa locked and set their gain to zero. 

At each divide step we have a iP,Q) pair as input. The 
bipartition algorithm returns a bipartition (Pa,Ph) of the taxa set 
P. We then divide Q into 2« 'ind Qi, and obtain {Pa,Qa) and 
(Pb'Qb) pairs. Pa and Pb wiU be further bipartitioned in 
subsequent divide steps. The pseudo-code of the bipartition 
method MFM is given in Table S4 in File S 1 . Moreover, the run 
time analyses of Algorithm MFM is described in File SI. 

Supporting Information 

FUe SI Supplementary material. Additional tables, and the 
pseudocode and time complexity of MFM bipartition algorithm 
are presented. 
(PDF) 
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